Search CORE

34 research outputs found

Hyracks: A flexible and extensible foundation for data-intensive computing

Author: Michael Carey
Nicola Onose
Raman Grover
Rares Vernica
Vinayak Borkar
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

Abstract—Hyracks is a new partitioned-parallel software plat-form designed to run data-intensive computations on large shared-nothing clusters of computers. Hyracks allows users to express a computation as a DAG of data operators and connec-tors. Operators operate on partitions of input data and produce partitions of output data, while connectors repartition operators’ outputs to make the newly produced partitions available at the consuming operators. We describe the Hyracks end user model, for authors of dataflow jobs, and the extension model for users who wish to augment Hyracks ’ built-in library with new operator and/or connector types. We also describe our initial Hyracks implementation. Since Hyracks is in roughly the same space as the open source Hadoop platform, we compare Hyracks with Hadoop experimentally for several different kinds of use cases. The initial results demonstrate that Hyracks has significant promise as a next-generation platform for data-intensive applications. I

CiteSeerX

Crossref

AsterixDB: A Scalable, Open Source BDMS

Author: Alsubaiee Sattam
Altowim Yasser
Altwaijry Hotham
Behm Alexander
Borkar Vinayak
Bu Yingyi
Carey Michael
Cetindil Inci
Cheelangi Madhusudan
Faraaz Khurram
Gabrielova Eugenia
Grover Raman
Heilbron Zachary
Kim Young-Seok
Li Chen
Li Guangqiang
Ok Ji Mahn
Onose Nicola
Pirzadeh Pouria
Tsotras Vassilis
Vernica Rares
Wen Jian
Westmann Till
Publication venue
Publication date: 02/07/2014
Field of study

AsterixDB is a new, full-function BDMS (Big Data Management System) with a feature set that distinguishes it from other platforms in today's open source Big Data ecosystem. Its features make it well-suited to applications like web data warehousing, social data storage and analysis, and other use cases related to Big Data. AsterixDB has a flexible NoSQL style data model; a query language that supports a wide range of queries; a scalable runtime; partitioned, LSM-based data storage and indexing (including B+-tree, R-tree, and text indexes); support for external as well as natively stored data; a rich set of built-in types; support for fuzzy, spatial, and temporal types and queries; a built-in notion of data feeds for ingestion of data; and transaction support akin to that of a NoSQL store. Development of AsterixDB began in 2009 and led to a mid-2013 initial open source release. This paper is the first complete description of the resulting open source AsterixDB system. Covered herein are the system's data model, its query language, and its software architecture. Also included are a summary of the current status of the project and a first glimpse into how AsterixDB performs when compared to alternative technologies, including a parallel relational DBMS, a popular NoSQL store, and a popular Hadoop-based SQL data analytics platform, for things that both technologies can do. Also included is a brief description of some initial trials that the system has undergone and the lessons learned (and plans laid) based on those early "customer" engagements

arXiv.org e-Print Archive

CiteSeerX

Inverse Functions in the AquaLogic Data Services Platform ABSTRACT

Author: Nicola Onose
Publication venue
Publication date: 01/04/2008
Field of study

When integrating data from heterogeneous sources, it is often necessary to transform both the schemas and the data from the underlying sources in order to present the integrated data in the form desired by its consuming applications. Unfortunately, these transformations—particularly if implemented by custom code—can block query optimization and updates, leading to potentially severe performance and functionality limitations. To circumvent these problems, the BEA AquaLogic Data Services Platform provides support for user-defined inverse functions. This paper describes the motivation, design, user experience, and implementation associated with inverse functions in ALDSP. This functionality debuted in version 2.1 of ALDSP in March 2006. 1

CiteSeerX

Uncovering the full potential of data services

Author: Onose Nicola Dan
Publication venue: eScholarship, University of California
Publication date: 01/01/2009
Field of study

Making use of available services when building Web applications is a major challenge for today's developers. I address this challenge by using a declarative interface for data-centric Web services (aka data services), which are published as queries over a source schema. Programmers simply write queries over the source schema and rely on the system to automatically translate them to calls to existing data services. Thus, programmers can focus on extracting the data they need, without having to understand the definition or the implementation of each individual service. This dissertation discusses the main underlying technical problem, that of deciding whether a query can be translated into service calls. We consider two settings: when the system cannot do any post- processing and hence can issue only one service call (I call that expressibility) and when it is able to issue several calls and combine the results (I call it support). Expressibility and support are studied both for services that are listed individually and for compactly represented services (using grammar-like or Datalog formalisms). I also present contributions to extending the underlying service infrastructure with new features, several of which were added to the Distributed XQuery (DXQ) framework. DXQ is an XML query and scripting language with support for side effects, distribution, parallelism, which I also used as implementation platform for workflow languages

Ezid

eScholarship - University of California

XQuery at Your Web Service

Author: Jerome Simeon
Nicola Onose
Publication venue
Publication date: 01/01/2004
Field of study

XML messaging is at the heart of Web services, providing the flexibility required for their deployment, composition, and maintenance. Yet, current approaches to Web services development hide the messaging layer behind Java or C# APIs, preventing the application to get direct access to the underlying XML information. To address this problem, we advocate the use of a native XML language, namely XQuery, as an integral part of the Web services development infrastructure. The main contribution of the paper is a binding between WSDL, the Web Services Description Language, and XQuery. The approach enables the use of XQuery for both Web services deployment and composition. We present a simple command-line tool that can be used to automatically deploy a Web service from a given XQuery module, and extend the XQuery language itself with a statement for accessing one or more Web services. The binding provides tight-coupling between WSDL and XQuery, yielding additional benefits, notably: the ability to use WSDL as an interface language for XQuery, and the ability to perform static typing on XQuery programs that include Web service calls. Last but not least, the proposal requires only minimal changes to the existing infrastructure. We report on our experience implementing this approach in the Galax XQuery processor

CiteSeerX

Crossref

ABSTRACT

Author: Mary Fernández
Nicola Onose
Publication venue
Publication date
Field of study

We present DXQ, an extension of XQuery to support the effective and efficient development of distributed XML applications. A DXQ program can invoke remote DXQ programs both synchronously and asynchronously and can dynamically ship DXQ code to execute at remote servers. We illustrate the power of the language with two distributed applications: the resolution algorithm of the Domain Name System (DNS) and the Narada overlay-network protocol. Our implementation permits concurrent evaluation of DXQ expressions at each server and can produce results extensionally (as XML values) or intensionally (as DXQ expressions). 1

CiteSeerX